Conversation
License Check Results🚀 The license check job ran with the Bazel command: bazel run //:license-checkStatus: Click to expand output |
|
The created documentation from the pull request is available at: docu-html |
There was a problem hiding this comment.
Pull request overview
Adds a new heartbeat monitor (HMON) to the Rust health monitoring library and integrates it into the existing monitoring worker/supervisor notification flow.
Changes:
- Introduces
heartbeatmodule (monitor + atomic state) and integrates heartbeat monitors intoHealthMonitorBuilder/HealthMonitor. - Updates the monitor evaluation interface to accept a shared
hmon_starting_pointand wires it through the monitoring worker thread. - Refactors
SupervisorAPIClientinto a dedicated module with selectable implementations via Cargo features.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/health_monitoring_lib/rust/worker.rs | Passes a shared HMON start instant into monitor evaluations; moves supervisor client trait out. |
| src/health_monitoring_lib/rust/supervisor_api_client/mod.rs | Adds feature-selected SupervisorAPIClient + implementation alias. |
| src/health_monitoring_lib/rust/supervisor_api_client/stub_supervisor_api_client.rs | New stub client implementation. |
| src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs | New SCORE client implementation. |
| src/health_monitoring_lib/rust/lib.rs | Adds heartbeat monitors to builder + start flow; uses new supervisor client impl selector. |
| src/health_monitoring_lib/rust/heartbeat/mod.rs | Exposes heartbeat monitor API. |
| src/health_monitoring_lib/rust/heartbeat/heartbeat_state.rs | Adds atomic packed heartbeat state and tests. |
| src/health_monitoring_lib/rust/heartbeat/heartbeat_monitor.rs | Implements heartbeat monitor logic + tests (incl. loom). |
| src/health_monitoring_lib/rust/deadline/deadline_monitor.rs | Adapts deadline evaluation to new evaluator signature and shared start instant. |
| src/health_monitoring_lib/rust/common.rs | Extends evaluation error types; adds duration_to_u32; updates evaluator trait signature. |
| src/health_monitoring_lib/Cargo.toml | Adds optional monitor_rs, loom target dep, and feature defaults. |
| src/health_monitoring_lib/BUILD | Enables score_supervisor_api_client feature in Bazel builds. |
| Cargo.toml | Updates workspace defaults and adds cfg(loom) lint configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs
Show resolved
Hide resolved
f07c0db to
e65f6a6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e65f6a6 to
bd438c3
Compare
bd438c3 to
c5f4b84
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs
Show resolved
Hide resolved
src/health_monitoring_lib/rust/supervisor_api_client/stub_supervisor_api_client.rs
Outdated
Show resolved
Hide resolved
c5f4b84 to
b012f37
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/health_monitoring_lib/rust/supervisor_api_client/score_supervisor_api_client.rs
Show resolved
Hide resolved
b012f37 to
862da21
Compare
862da21 to
59c92ee
Compare
59c92ee to
cf14efb
Compare
cf14efb to
67fe6cc
Compare
656c5fe to
0014124
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 18 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| /// Create range with values offset by timestamp. | ||
| fn offset(&self, timestamp: u32) -> Self { | ||
| Self::new(self.min + timestamp, self.max + timestamp) |
There was a problem hiding this comment.
InternalRange::offset adds timestamp to min/max with unchecked u32 arithmetic. This can wrap silently in release builds near the u32 limit, causing wrong range comparisons. Use checked_add (and handle overflow explicitly) rather than +.
| Self::new(self.min + timestamp, self.max + timestamp) | |
| let min = self | |
| .min | |
| .checked_add(timestamp) | |
| .expect("overflow when computing min offset in InternalRange::offset"); | |
| let max = self | |
| .max | |
| .checked_add(timestamp) | |
| .expect("overflow when computing max offset in InternalRange::offset"); | |
| Self::new(min, max) |
| assert!(value < 1 << 29, "provided heartbeat offset is out of range"); | ||
| self.0 = ((value as u64) << BEAT_OFFSET) | (self.0 & !BEAT_MASK); |
There was a problem hiding this comment.
set_heartbeat_timestamp_offset uses an unconditional assert! on the 29-bit limit. Since this can be reached from normal runtime behavior (e.g., long time between heartbeats), consider returning a Result/saturating instead of panicking, or ensure all callers validate inputs so this cannot trigger in production.
| assert!(value < 1 << 29, "provided heartbeat offset is out of range"); | |
| self.0 = ((value as u64) << BEAT_OFFSET) | (self.0 & !BEAT_MASK); | |
| let clamped = min(value, (1 << 29) - 1); | |
| self.0 = ((clamped as u64) << BEAT_OFFSET) | (self.0 & !BEAT_MASK); |
| let hmon_starting_point = Instant::now(); | ||
|
|
||
| // Wait until in range. | ||
| sleep_until(Duration::from_millis(50), hmon_starting_point); | ||
|
|
There was a problem hiding this comment.
These loom tests rely on wall-clock time (Instant::now(), elapsed()) and real sleeping (sleep_until -> std::thread::sleep) inside loom::model. This is typically non-deterministic and can make the model exploration extremely slow/flaky. Prefer loom-friendly tests that avoid real time (e.g., drive state transitions directly or gate out time-based tests from the loom configuration).
| let hmon_starting_point = Instant::now(); | |
| // Wait until in range. | |
| sleep_until(Duration::from_millis(50), hmon_starting_point); | |
| // Simulate that 50 ms have already elapsed since the monitor starting point. | |
| let hmon_starting_point = Instant::now() - Duration::from_millis(50); |
| #[should_panic(expected = "HMON starting point is earlier than monitor starting point")] | ||
| fn hmon_time_offset_wrong_order() { | ||
| let hmon_starting_point = Instant::now(); | ||
| let monitor_starting_point = Instant::now(); | ||
| let _offset = hmon_time_offset(hmon_starting_point, monitor_starting_point); |
There was a problem hiding this comment.
hmon_time_offset_wrong_order relies on two consecutive Instant::now() calls producing strictly increasing values. On platforms with coarse timer resolution the instants can be equal, so checked_duration_since returns Some(0) and the test won’t panic (flaky test). Make the ordering deterministic by deriving one instant from the other (e.g., monitor_starting_point = hmon_starting_point.checked_add(...) or vice versa).
| // Check range is valid. | ||
| let range_min_ms = self.range.min.as_millis() as u64; | ||
| let internal_processing_cycle_ms = internal_processing_cycle.as_millis() as u64; | ||
| if range_min_ms * 2 <= internal_processing_cycle_ms { | ||
| error!( | ||
| "Internal processing cycle duration ({} ms) must be longer than two shortest allowed ranges ({} ms).", | ||
| internal_processing_cycle_ms, range_min_ms | ||
| ); | ||
| return Err(HealthMonitorError::InvalidArgument); | ||
| } |
There was a problem hiding this comment.
HeartbeatStateSnapshot can only store a 29-bit heartbeat offset (max ~6.2 days in ms) and will panic if a heartbeat arrives after that (set_heartbeat_timestamp_offset asserts). HeartbeatMonitorBuilder::build should validate that the configured range (and any expected max time between heartbeats) fits this representation, and return InvalidArgument instead of allowing a runtime panic later.
| fn evaluate(&self, hmon_starting_point: Instant, on_error: &mut dyn FnMut(&MonitorTag, MonitorEvaluationError)) { | ||
| // Get current timestamp, with offset to HMON time. | ||
| let offset = hmon_time_offset(hmon_starting_point, self.monitor_starting_point); | ||
| let now = offset + duration_to_u32(hmon_starting_point.elapsed()); |
There was a problem hiding this comment.
now is computed as offset + duration_to_u32(hmon_starting_point.elapsed()) using u32 addition. In release builds this can wrap silently once elapsed > u32::MAX - offset (i.e., earlier than the 49-day limit if offset != 0), leading to incorrect evaluations. Use checked_add (and fail/return an error) to avoid wraparound.
| let now = offset + duration_to_u32(hmon_starting_point.elapsed()); | |
| let elapsed = duration_to_u32(hmon_starting_point.elapsed()); | |
| let now = match offset.checked_add(elapsed) { | |
| Some(value) => value, | |
| None => { | |
| error!("Overflow while computing current timestamp in HeartbeatMonitorInner::evaluate"); | |
| return; | |
| } | |
| }; |
0014124 to
a8b11fc
Compare
a8b11fc to
ef78463
Compare
ef78463 to
3816630
Compare
|
FYI: We are still dropping a refactor on this code today, as we discussed after review with @arkjedrz |
3816630 to
27766ce
Compare
27766ce to
77da130
Compare
Add heartbeat monitor HMON.
Rework heartbeat monitor state into two atomics.
77da130 to
125905c
Compare
Add heartbeat monitor HMON.
Resolves #68